With advances in computational load distribution neural networks have become an extremely powerful machine learning algorithm. We will focus on the arguably the simplest type of neural network, the multilayer perceptron (MLP). Neural networks architectures are inspired by our brain's architecture made of neurons. Neurons in our brain accept information (neurotransmitters) at their dendrites until a certain threshold is met, then an electrical impulse is sent up the axon where its own neurotransmitters are excreted for the next neuron to receive. Neurons in a neural newtork use an initial input, weighted values on that input, and bias values as total input and using matrix multiplication along with activation functions output a new value. Like our actual brain, these neurons can pass on information to several other neurons and have several layers deep of neurons until a decision is made by the algorithm. These added layers in a neural network are called hidden layers, and we can add as many as we'd like but there is a trade off between performance and the speed of the algorithm. However, unlike our brain, when the algorithm reaches its intial decision on what it predicts, this is not where the process ends. Neural networks adjust their weights repeatedly to optimize them for the training data it is given to try to improve its prediction. It can do this by using partial derivatives to minimize the loss of a function it is trying to approximate.
Even more so than other machine learning algorithms, neural networks are very experimentative on how to optimize their architecture. But understanding their parameters can help guide initial decisions in the network to ensure not too much time is wasted computational brute force. There is still a lot to learn on how neural networks actually learn and how best to apply them outside of very specific problems without experimenting first.
We will be building a neural network based upon data from the large hadron collider (LHC) at CERN to classify the relatively newly found Higgs Boson as it either tau tau decays or the measurement was just background. The LHC yields an unfathomable amount of data and accordingly this data set is so large that it is not optimal to try to load it all into memory at once. At 2.75 gigabytes with 11 million rows of data, instead of loading the entire dataset into memory we will subset it and create our neural network from 2.5 millions rows of input.
To attempt to find an optimal neural network for this data we will build our neural network architecture one portion at a time. First, we will find an optimal neural network with just one hidden layer. This allows us to focus on the parameters of how many nodes we'll use, and which activation function works best for this problem. Thereafter, using a systematic approach we will expand the amount of hidden layers, vary the activation functions further, adjust batch sizes, utilize different kernel initializers and optimizers to find how all of these aspects influence our model.
Hidden layers receive input from a previous layer (if it is the first hidden layer, input would be the input layer, if not it would be the previous hidden layer) and outputs until the next layer (whether it be the next hidden layer or the final output layer). Hidden layers are made up of neurons/nodes that receive input values from a bias value and all of the previous layer's neurons that are multiplied by their associated weights. These neurons take this matrix math and apply an activation function to yield an output for each of the neurons at the next layer. The more hidden layers added to the model can give the model more power but this comes at the cost of computational time by increasing the amount of math needed to compute during each epoch.
Activation functions are mathematical functions that map an input value into an output value. The variation in each activation function allows the mapping of outputs to different values to accomplish different results. This allows for the contruction of both linear and non-linear models. For our purposes, we will only focus on the non-linear functions.
The Sigmoid function takes in as input any real number and outputs a value between [0,1]. It is commonly used in models to predict the probability as an output since the output falls between 0 and 1.
$\phi(z) = \frac{1}{(1+e^{-z})}$
The Tanh function is very similar to the Sigmoid function but instead maps the output results to [-1, 1]. This allows inputs that are negative to maintain their negativity. The Tanh and Sigmoid function can be used as binary classification functions.
$\phi(z) = \frac{sinh(z)}{cosh(z)}$
The ReLU function provides an output between [0, inf]. If the input value is less than zero then its output is a zero. If the input is a postitive value then the value is passed through. The ReLU function also avoids the vanishing gradient problem.
$\phi(z) = x: x > 0$
$\phi(z) = 0: x < 0$
The ELU function is vary similar to the ReLU activation function except instead of the negative values being mapped to zero, the values are mapped to a value very close to zero. The difference between ELU and SELU is the scaling factor $a$. If $a$ is zero, then it is simply refered to as ELU.
$\phi(z) = x: x > 0$
$\phi(z) = a(e^{x}-1): x < 0$
The kernel initialization functions are used to assign initial weights to each layer. As the model steps through each iteration, these weights are adjusted accordingly. The inital values chosen will effect how quickly the model will (if possible) converge on a minimization of the loss function.
The purpose of the optimization function is to determine how the weights at each layer should be adjusted (increased/decreased) to continue to minimize the loss function. The most basic tuning parameter is the learning rate.
The learning rate is how much it should increase/decrease the weight at each iteration. If the increase/decrease is too large then it may result in the value of the loss function increasing. If the increase/decreate is too small then the model will have to perform many interations before it reaches the minimization of the loss function. The proper balance is a learning rate that reaches the minimum value of the loss function in the least amount of time/iterations.
The following optimization functions are evaluated:
The batch size determines the number of data points that will be propogated through the neural network at a given time. Eventually all data points will be fed through the network (completing one epoch). The advantage of using a smaller batch size allows us to work with large data sets in chunks that would otherwise not fit into memory. In some instances, this also allows the neural network to train faster. The disadvantage is that we are looking at a much smaller portion of the data and therefore are sacrificing accuracy of the true gradient.
First we will look at how a neural network is effected if we alter the numer of neurons in one hidden layer. We will do this with several different activation functions as well to see if this effects each activation function differently as well.
plotActFunction()
Figure 1 - Single Hidden Layer Model. Holding the hidden layer constant, neurons of 10, 20, 50, and 100 were used for different activation functions.
ReLu had a the most varied accuracy for the range of neurons used in the hidden layer. Neurons of 10 and 50 were approximately equal while we saw a decrease in accuracy at 20 neurons and a stark decrease in accuracy when 100 neurons were used. For the sigmoid activation function there was a drastic loss in accuracy after 10 neurons. For tanh there was diminishing returns on accuracy after adding 20 neurons although it did have a constant increase in accuracy as neurons were added. Elu increased its accuracy at 10 neurons from 65% to 70% at 20 neurons. It than dipped to 66% at 50 neurons but then back to 70% at 100 neurons. SeLu experienced a dip in accuracy after 10 neurons and did not recover until 100 neurons were used.
This shows how in one hidden layer that adding neurons is not always a cure to increase accuracy in a neural network and it has various results depending on which activation function is used.
We will now test the same question as before but hold our hidden layers constant at two while we vary the nodes and activation functions.
plotActNodeFunction()
Figure 2 - Two Hidden Layer Model. Holding the hidden layer constant, neurons of 10, 20, 50, and 100 were used for different activation functions.
Compared to a one hidden layer model, two hidden layer models seem to be detrimental to accuracy as neurons increase for the majority of activation functions. ReLu had a very constant accuracy as neurons increased. Sigmoid had the most varied response to the different amount of neurons by dramatically increasing its accuracy by 21% from 20 to 50 neurons. Accuracy then dropped almost as drastically once neurons increased to 100. Tanh had a fairly constant accuracy score as well with a slight decrease in accuracy at 20 neurons. Elu increased its accuracy by 7% when neurons increased to 20 from 10, but then lost accuracy thereafter with 50 and 100 neurons. Selu's best accuracy was with 10 neurons and dropped drastically as neurons increased.
Like with one hidden layer neural networks, increasing the amount of neurons is not a sure fire way to increase the accuracy of your model and it is depedent on your activation functions as well. Now we can plot each activation function to compare whether one or two hidden layers, and how many neurons improves the model score.
plotRelu()
plotSigmoid()
plotTanh()
plotELU()
plotSELU()
Figure 3 - Comparing 1 and 2 hidden layer neural networks with various neuron amounts.
When we hold constant the amount of neurons and activation functions to compare the effect of increasing the amount of hidden layers from one to two we can see there is not a constant increase in accuracy. ReLu holds a near constant accuracy with two hidden layers, while with one accuracy drastically drops off once it approaches 100 neurons. Sigmoid has by far its best accuracy at 10 neurons for one hidden layer and 50 neurons for two hidden layers. Tanh has a similar accuracy trend with one and two hidden layers except for 20 neurons in the two hidden layer model. Elu's accuracy score for one and two hidden layers are fairly similar until 100 neurons while one hidden layer excels while two hidden layer plummets in accuracy. Lastly, selu showed a massive drop in accuracy with two layers compared to one after neurons increased.
Next we wanted to take one of the best models we created so far and varied the batch size to see how that would effect our accuruacy score. The model we chose was a two layer neural network with 100 neurons on each layer, and using ReLu as our activation function. We wanted chose batch sizes of 500, 1000, 10000, and 100000.
plotBatch()
Figure 4 - Varying batch sizes for our previous best model. Batch sizes used were: 500, 1000, 10000, 100000.
We found our optimal batch size for this model to be 1000, which is 0.0004% of our data that is loaded at a time into our model. It was outperformed our next best batch size by nearly 4%.
Since kernel initializers alter how initial weights are created. Whichever creates the best weights initially will give that model a 'head start' in converging to the best possible score that model can yield. We'll be using random_normal, lecun_normal, and glorot_normal.
plotKernel()
Figure 5 - Varying Kernel Initializers. Holding all other parameters constant, we tested our models accuracy for random_normal, lecun_normal, and glorot_normal.
The difference in accuracy between lecun_normal (75.25%) and glorot_normal (75.91%) was farily small with a 0.66 difference. Behind those was random_normal with 71.28% accuracy.
Our last test will feature optimizers that alter the learning rate as batch sizes complete. We'll be using the best model we've found so far and test out the adaGrad, adaDelta, Adam, and Nadam optimizers.
plotOptimizer()
Figure 6 - Varying Optimizers. Holding all other parameters constant, we ran models for the adaGrad, adaDelta, Adam, and Nadam optimizers.
Out of the optimizers we tested adaDelta performed the best (79.82% accuracy) for our data. It was better than the next best optimizer (adaGrad, 73.00% accuracy) by 6.82%. Adam and nAdam produced 68.47% and 48.70% accuracy respectively.
Using the results that we've gathered so far we will try to build the best model we can. We will be using a two hidden layer neural network with 100 neurons, the ReLu activation function, glorot nomral kernel initializer, a batch size of 1000, and adaDelta as our optimizer for our learning rate. Also now that we've chosen our parameters we will bring in more data (10.5 million rows) and increase our epochs to let our neural network run longer with the most amount of data that we can give it currently.
plotMetric(history)
for key, val in bestResult.items():
print(f"{key}: {val}")
Figure 7 - Building our optimal model.
By loading in more data and increasing our epochs to 20, we found that our optimal model yielded at 83.11% accuracy. Both of these increased our previous highest accuracy by 3.29%.
One of the main questions we wanted to answer was what was the effect of adding more hidden layers and/or neurons to our neural network? We tried to tackle both of these issues by holding hidden layers constant and increasing our neurons, and the inverse as well. We did this for several different activation functions and found that there was not a clear trend across all activation functions that increasing the amount of neurons nor the amount of hidden layers yielded a better model. It seems that changing neuron and hidden layer amounts are also very sensitve to other parameters as well. Further analysis can be conducted by increasing the epochs of the tests we ran and/or increasing the amount of data to validate those results.
Next we wanted to see which parameters gave us the best result and why we think this is true.
With a node value of 100, we believe this allowed the model to be sufficiently complex without overly fitting the data. The ReLU function seemed to perform the best across all test cases. Sources suggest this is because of its linear nature it doesn't suffer from the vanishing gradient problem. As well as a sparse matrix which tends to perform better than denser matrices. The glorot_normal matricy initializer randomly selects values from a truncated normal distribution that takes into account the number of input and output nodes which indicates that the value are at least partially correlated with our model. The Adadelta optimizer uses a varying learning rate to calculate the gradient which allows it against its step size as it gets closer to the minima. In addition, Adadelta builds upon Adagrad by limiting the window to which it can look back at previous gradient calculations. This results in a more localized gradient around that input instead of an accumulation of gradient values that are further away from the current input. A batch size of 1000 allowed the weights to be updated more frequently with sufficient amount of data points to calculate the gradient. A smaller value would have resulted in an insufficient amount of data to calculate an accurate gradient, and a larger batch size would prohibit the model from updating more frequently. If we would have increased the number of epochs we might have seen a higher number of points perform better but with our parameters, 1000 was the ideal compromise.
Lastly, how we made our decision that our model was done. We made this decision after altering one parameter at a time while holding all others constant to find ones that seemingly worked best for our model. This led us down a path to find the parameters that we thought would yield the optimal model when all put together in one model. With that we could increase the amount of data fed into our model and the amount of epochs to get the most accuracy we could. Further code should later be implemented to increase the epoch size drastically and stop the model when the accuracy change per epoch becomes neglible or the accuracy begins to decrease. In another real world setting, time contraints can be a major factor when to decide when a neural network model is complete. Training very complex neural networks with enormous amounts of data can be costly in computational time and actual money if using cloud services to help with the computational load. This potentially impacts its return on investment (ROI) to the company by taking so long to fully optimize.
import numpy as np
import pandas as pd
import tensorflow as tf
import keras
from matplotlib import pyplot
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.utils import multi_gpu_model
from keras.optimizers import SGD
from keras.initializers import random_normal, random_uniform, VarianceScaling, lecun_normal, lecun_uniform
from keras.initializers import glorot_normal, glorot_uniform
from sklearn.metrics import roc_auc_score
#Read in 2.5 Million rows. Split into train and test data set
N=2500000. #Change this line adjust the number of rows.
data=pd.read_csv("HIGGS.csv", nrows=N, header=None)
test_data=pd.read_csv("HIGGS.csv", nrows=500000, header=None, skiprows=10500000)
y = np.array(data.loc[:,0])
x = np.array(data.loc[:,1:])
x_test = np.array(test_data.loc[:,1:])
y_test = np.array(test_data.loc[:,0])
#global functions
def plotMetric(history):
# plot loss during training
pyplot.subplot(211)
pyplot.title('Loss')
pyplot.xlabel('Epoch')
pyplot.ylabel('Metric Result')
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
# plot accuracy during training
pyplot.subplot(212)
pyplot.title('Accuracy')
pyplot.xlabel('Epoch')
pyplot.ylabel('Metric Result')
pyplot.plot(history.history['acc'], label='train')
pyplot.plot(history.history['val_acc'], label='test')
pyplot.legend()
pyplot.tight_layout()
pyplot.show()
def plotActFunction():
q1 = pd.DataFrame({
'relu': [0.7494714761165963, 0.6844046501901166, 0.7520676569834632, 0.5022345157624841],
'sigmoid': [0.702603151005859, 0.4992187222113161, 0.49587507436517203, 0.5114362046742831],
'tanh': [0.6708376844572501, 0.7073781358798696, 0.7280739489592952, 0.7368565712655764],
'elu': [0.6488802924075708, 0.6967709503111617, 0.6606804661843719, 0.7033196371850748],
'selu': [0.739319745642502, 0.7141225114864298, 0.6946993338631965, 0.734129602507341],
}, index=[10, 20, 50, 100])
lines1 = q1.plot.line(title = 'Activation Function Model Performance')
lines1.set_xlabel("Nodes")
lines1.set_ylabel("ROC AUC Score")
def plotActNodeFunction():
q1b = pd.DataFrame({
'relu': [0.7331820375082503, 0.7279145670888859, 0.7303996319364721, 0.7517170416945581],
'sigmoid': [0.48112367787929267, 0.4999891272367009, 0.713154187089281, 0.5712550765297988],
'tanh': [0.7208538403453445, 0.6777523198749522, 0.7526727166338352, 0.7395697262052479],
'elu': [0.6616180458323012, 0.7368514242944441, 0.6529415325334464, 0.4630253523627044],
'selu': [0.7406490611946785, 0.6932305203868041, 0.5421163972915666, 0.5136491201753131],
}, index=[10, 20, 50, 100])
lines2 = q1b.plot.line(title = 'Activation Function Model Performance')
lines2.set_xlabel("Nodes")
lines2.set_ylabel("ROC AUC Score")
def plotRelu():
q2relu = pd.DataFrame({
'relu-1': [0.7494714761165963, 0.6844046501901166, 0.7520676569834632, 0.5022345157624841],
'relu-2': [0.7331820375082503, 0.7279145670888859, 0.7303996319364721, 0.7517170416945581],
}, index=[10, 20, 50, 100])
ax = q2relu.plot.line(title = 'ReLU Model Performance')
ax.set_xlabel("Nodes")
ax.set_ylabel("ROC AUC Score")
def plotSigmoid():
q2sigmoid = pd.DataFrame({
'sigmoid-1': [0.702603151005859, 0.4992187222113161, 0.49587507436517203, 0.5114362046742831],
'sigmoid-2': [0.48112367787929267, 0.4999891272367009, 0.713154187089281, 0.5712550765297988],
}, index=[10, 20, 50, 100])
ax = q2sigmoid.plot.line(title = 'Sigmoid Model Performance')
ax.set_xlabel("Nodes")
ax.set_ylabel("ROC AUC Score")
def plotTanh():
q2tanh = pd.DataFrame({
'tanh-1': [0.6708376844572501, 0.7073781358798696, 0.7280739489592952, 0.7368565712655764],
'tanh-2': [0.7208538403453445, 0.6777523198749522, 0.7526727166338352, 0.7395697262052479],
}, index=[10, 20, 50, 100])
ax = q2tanh.plot.line(title = 'Tanh Model Performance')
ax.set_xlabel("Nodes")
ax.set_ylabel("ROC AUC Score")
def plotELU():
q2elu = pd.DataFrame({
'elu-1': [0.6488802924075708, 0.6967709503111617, 0.6606804661843719, 0.7033196371850748],
'elu-2': [0.6616180458323012, 0.7368514242944441, 0.6529415325334464, 0.4630253523627044],
}, index=[10, 20, 50, 100])
ax = q2elu.plot.line(title = 'ELU Model Performance')
ax.set_xlabel("Nodes")
ax.set_ylabel("ROC AUC Score")
def plotSELU():
q2selu = pd.DataFrame({
'selu-1': [0.739319745642502, 0.7141225114864298, 0.6946993338631965, 0.734129602507341],
'selu-2': [0.7406490611946785, 0.6932305203868041, 0.5421163972915666, 0.5136491201753131],
}, index=[10, 20, 50, 100])
ax = q2selu.plot.line(title = 'SELU Model Performance')
ax.set_xlabel("Nodes")
ax.set_ylabel("ROC AUC Score")
def plotBatch():
q3 = pd.DataFrame({
' ': [0.631143945362551, 0.7614658650320869, 0.7228425873031199, 0.6366954934942467],
}, index=[500, 1000, 10000, 100000])
ax = q3.plot.line(title='Batch Size vs Model Performance')
ax.set_xlabel("Batch Size")
ax.set_ylabel("ROC AUC Score")
def plotKernel():
q4 = pd.DataFrame({
'Score': [0.7128371162649332, 0.7524838324457825, 0.7591295447725338],
'Kernel_Initializers': ['random_normal', 'lecun_normal', 'glorot_normal']})
ax = q4.plot.bar(x='Kernel_Initializers', y='Score', rot=45, title='Kernel Initializers vs Model Performance')
ax.set_xlabel("Kernel Initializer")
ax.set_ylabel("ROC AUC Score")
def plotOptimizer():
q5 = pd.DataFrame({
'Score': [0.7299848975104806, 0.7981572106570257, 0.6846821732954556, 0.48695086234710605],
'Optimizer': ['Adagrad', 'Adadelta', 'Adam', 'Nadam']})
ax = q5.plot.bar(x='Optimizer', y='Score', rot=45, title='Optimizer vs Model Performance')
ax.set_xlabel("Optimizer")
ax.set_ylabel("ROC AUC Score")
bestResult = {}
bestScore = -1
nodes = [10, 20, 50, 100]
activation = ['relu', 'sigmoid', 'tanh', 'elu', 'selu']
for activate in activation:
for node in nodes:
model = Sequential()
model.add(Dense(node, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Dense(node, activation=activate))
model.add(Dense(1))
model.compile(loss='binary_crossentropy', metrics=['acc'], optimizer='adam')
history = model.fit(x, y, epochs=5, batch_size=1000, validation_split=.1)
plotMetric(history)
score = roc_auc_score(y_test,model.predict(x_test))
print(f"Nodes: {node}")
print(f"Activation: {activate}")
print(f"Score: {score}")
if score > bestScore:
bestResult['Node'] = node
bestResult['Activation'] = activate
bestResult['Score'] = score
bestScore = score
print(f"{bestResult}\n")
bestResult = {}
bestScore = -1
nodes = [10, 20, 50, 100]
activation = ['relu', 'sigmoid', 'tanh', 'elu', 'selu']
for activate in activation:
for node in nodes:
with tf.device("/cpu:0"):
model = Sequential()
model.add(Dense(node, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Dense(node, activation=activate))
model.add(Dense(node, activation=activate))
model.add(Dense(1))
model = multi_gpu_model(model, gpus=2)
model.compile(loss='binary_crossentropy', metrics=['acc'], optimizer='adam')
history = model.fit(x, y, epochs=5, batch_size=1000, validation_split=.1)
plotMetric(history)
score = roc_auc_score(y_test,model.predict(x_test))
print(f"Nodes: {node}")
print(f"Activation: {activate}")
print(f"Score: {score}")
if score > bestScore:
bestResult['Node'] = node
bestResult['Activation'] = activate
bestResult['Score'] = score
bestScore = score
print(f"{bestResult}\n")
bestResult = {}
bestScore = -1
node = 100
activate = 'relu'
optimize = 'Adam'
batchSize = [500, 1000, 10000, 100000]
for size in batchSize:
with tf.device("/cpu:0"):
model = Sequential()
model.add(Dense(node, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Dense(node, activation=activate))
model.add(Dense(node, activation=activate))
model.add(Dense(1))
model = multi_gpu_model(model, gpus=2)
model.compile(loss='binary_crossentropy', metrics=['acc'], optimizer=optimize)
history = model.fit(x, y, epochs=5, batch_size=size, validation_split=.1)
plotMetric(history)
score = roc_auc_score(y_test,model.predict(x_test))
print(f"Nodes: {node}")
print(f"Activation: {activate}")
print(f"Score: {score}")
if score > bestScore:
bestResult['Node'] = node
bestResult['Activation'] = activate
bestResult['Optimization'] = optimize
bestResult['Batch'] = size
bestResult['Score'] = score
bestScore = score
print(f"{bestResult}\n")
initializers = ['random_normal', 'lecun_normal', 'glorot_normal']
bestResult = {}
bestScore = -1
node = 100
activate = 'relu'
optimize = 'Adam'
size = 1000
initializers = ['random_normal', 'lecun_normal', 'glorot_normal']
for initiate in initializers:
with tf.device("/cpu:0"):
model = Sequential()
model.add(Dense(node, input_dim=x.shape[1], kernel_initializer=initiate))
model.add(Dense(node, activation=activate))
model.add(Dense(node, activation=activate))
model.add(Dense(1))
model = multi_gpu_model(model, gpus=2)
model.compile(loss='binary_crossentropy', metrics=['acc'], optimizer=optimize)
history = model.fit(x, y, epochs=5, batch_size=size, validation_split=.1)
plotMetric(history)
score = roc_auc_score(y_test,model.predict(x_test))
print(f"Nodes: {node}")
print(f"Activation: {activate}")
print(f"Score: {score}")
if score > bestScore:
bestResult['Node'] = node
bestResult['Activation'] = activate
bestResult['Initializer'] = initiate
bestResult['Batch'] = size
bestResult['Score'] = score
bestScore = score
print(f"{bestResult}\n")
bestResult = {}
bestScore = -1
node = 100
activate = 'relu'
initiate = 'glorot_normal'
size = 1000
optimization = ['Adagrad', 'Adadelta', 'Adam', 'Nadam']
for optimize in optimization:
with tf.device("/cpu:0"):
model = Sequential()
model.add(Dense(node, input_dim=x.shape[1], kernel_initializer=initiate))
model.add(Dense(node, activation=activate))
model.add(Dense(node, activation=activate))
model.add(Dense(1))
model = multi_gpu_model(model, gpus=2)
model.compile(loss='binary_crossentropy', metrics=['acc'], optimizer=optimize)
history = model.fit(x, y, epochs=5, batch_size=size, validation_split=.1)
plotMetric(history)
score = roc_auc_score(y_test,model.predict(x_test))
print(f"Nodes: {node}")
print(f"Activation: {activate}")
print(f"Score: {score}")
if score > bestScore:
bestResult['Node'] = node
bestResult['Activation'] = activate
bestResult['Initializer'] = initiate
bestResult['Optimizer'] = optimize
bestResult['Batch'] = size
bestResult['Score'] = score
bestScore = score
print(f"{bestResult}\n")
#Read in 2.5 Million rows. Split into train and test data set
N=2500000. #Change this line adjust the number of rows.
data=pd.read_csv("HIGGS.csv", nrows=N, header=None)
test_data=pd.read_csv("HIGGS.csv", nrows=500000, header=None, skiprows=10500000)
y = np.array(data.loc[:,0])
x = np.array(data.loc[:,1:])
x_test = np.array(test_data.loc[:,1:])
y_test = np.array(test_data.loc[:,0])
bestResult = {}
bestScore = -1
node = 100
activate = 'relu'
initiate = 'glorot_normal'
size = 1000
optimize = 'Adadelta'
epoch = 5
with tf.device("/cpu:0"):
model = Sequential()
model.add(Dense(node, input_dim=x.shape[1], kernel_initializer=initiate))
model.add(Dense(node, activation=activate))
model.add(Dense(node, activation=activate))
model.add(Dense(1))
model = multi_gpu_model(model, gpus=2)
model.compile(loss='binary_crossentropy', metrics=['acc'], optimizer=optimize)
history = model.fit(x, y, epochs=epoch, batch_size=size, validation_split=.1)
plotMetric(history)
score = roc_auc_score(y_test,model.predict(x_test))
print(f"Nodes: {node}")
print(f"Activation: {activate}")
print(f"Score: {score}")
if score > bestScore:
bestResult['Node'] = node
bestResult['Activation'] = activate
bestResult['Initializer'] = initiate
bestResult['Optimizer'] = optimize
bestResult['Batch'] = size
bestResult['Score'] = score
bestScore = score
print(f"{bestResult}\n")
bestResult = {}
bestScore = -1
node = 100
activate = 'relu'
initiate = 'glorot_normal'
size = 1000
optimize = 'Adadelta'
epoch = 20
with tf.device("/cpu:0"):
model = Sequential()
model.add(Dense(node, input_dim=x.shape[1], kernel_initializer=initiate))
model.add(Dense(node, activation=activate))
model.add(Dense(node, activation=activate))
model.add(Dense(1))
model = multi_gpu_model(model, gpus=2)
model.compile(loss='binary_crossentropy', metrics=['acc'], optimizer=optimize)
history = model.fit(x, y, epochs=epoch, batch_size=size, validation_split=.1)
plotMetric(history)
score = roc_auc_score(y_test,model.predict(x_test))
print(f"Nodes: {node}")
print(f"Activation: {activate}")
print(f"Score: {score}")
if score > bestScore:
bestResult['Node'] = node
bestResult['Activation'] = activate
bestResult['Initializer'] = initiate
bestResult['Optimizer'] = optimize
bestResult['Batch'] = size
bestResult['Score'] = score
bestScore = score
print(f"{bestResult}\n")
bestResult = {}
bestScore = -1
node = 100
activate = 'relu'
initiate = 'glorot_normal'
size = 1000
optimize = 'Adadelta'
epoch = 50
with tf.device("/cpu:0"):
model = Sequential()
model.add(Dense(node, input_dim=x.shape[1], kernel_initializer=initiate))
model.add(Dense(node, activation=activate))
model.add(Dense(node, activation=activate))
model.add(Dense(1))
model = multi_gpu_model(model, gpus=2)
model.compile(loss='binary_crossentropy', metrics=['acc'], optimizer=optimize)
history = model.fit(x, y, epochs=epoch, batch_size=size, validation_split=.1)
plotMetric(history)
score = roc_auc_score(y_test,model.predict(x_test))
print(f"Nodes: {node}")
print(f"Activation: {activate}")
print(f"Score: {score}")
if score > bestScore:
bestResult['Node'] = node
bestResult['Activation'] = activate
bestResult['Initializer'] = initiate
bestResult['Optimizer'] = optimize
bestResult['Batch'] = size
bestResult['Score'] = score
bestScore = score
print(f"{bestResult}\n")
#Read in 10.5 Million rows. Split into train and test data set
N=10500000. #Change this line adjust the number of rows.
data=pd.read_csv("HIGGS.csv", nrows=N, header=None)
test_data=pd.read_csv("HIGGS.csv", nrows=500000, header=None, skiprows=10500000)
y = np.array(data.loc[:,0])
x = np.array(data.loc[:,1:])
x_test = np.array(test_data.loc[:,1:])
y_test = np.array(test_data.loc[:,0])
bestResult = {}
bestScore = -1
node = 100
activate = 'relu'
initiate = 'glorot_normal'
size = 1000
optimize = 'Adadelta'
epoch = 20
with tf.device("/cpu:0"):
model = Sequential()
model.add(Dense(node, input_dim=x.shape[1], kernel_initializer=initiate))
model.add(Dense(node, activation=activate))
model.add(Dense(node, activation=activate))
model.add(Dense(1))
model = multi_gpu_model(model, gpus=2)
model.compile(loss='binary_crossentropy', metrics=['acc'], optimizer=optimize)
history = model.fit(x, y, epochs=epoch, batch_size=size, validation_split=.1)
plotMetric(history)
score = roc_auc_score(y_test,model.predict(x_test))
print(f"Nodes: {node}")
print(f"Activation: {activate}")
print(f"Score: {score}")
if score > bestScore:
bestResult['Node'] = node
bestResult['Activation'] = activate
bestResult['Initializer'] = initiate
bestResult['Optimizer'] = optimize
bestResult['Batch'] = size
bestResult['Score'] = score
bestScore = score
print(f"{bestResult}\n")
bestResult['Score'] = 0.83107673875398
print(f"{bestResult}\n")
bestResult = {}
bestScore = -1
nodes = [10, 20, 50, 100]
activation = ['relu', 'sigmoid', 'tanh', 'elu', 'selu']
for activate in activation:
for node in nodes:
with tf.device("/cpu:0"):
model = Sequential()
model.add(Dense(node, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dense(1))
model = multi_gpu_model(model, gpus=2)
model.compile(loss='binary_crossentropy', metrics=['acc'], optimizer='adam')
history = model.fit(x, y, epochs=5, batch_size=1000, validation_split=.1)
plotMetric(history)
score = roc_auc_score(y_test,model.predict(x_test))
print(f"Nodes: {node}")
print(f"Activation: {activate}")
print(f"Score: {score}")
if score > bestScore:
bestResult['Node'] = node
bestResult['Activation'] = activate
bestResult['Score'] = score
bestScore = score
print(f"{bestResult}\n")
bestResult = {}
bestScore = -1
nodes = [10, 20, 50, 100]
activation = ['relu', 'sigmoid', 'tanh', 'elu', 'selu']
for node in nodes:
for activate1 in activation:
for activate2 in activation:
with tf.device("/cpu:0"):
model = Sequential()
model.add(Dense(node, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Dense(node, activation=activate1))
model.add(Dropout(.10))
model.add(Dense(node, activation=activate2))
model.add(Dense(1))
model = multi_gpu_model(model, gpus=2)
model.compile(loss='binary_crossentropy', metrics=['acc'], optimizer='adam')
history = model.fit(x, y, epochs=5, batch_size=1000, validation_split=.1)
plotMetric(history)
score = roc_auc_score(y_test,model.predict(x_test))
print(f"Nodes: {node}")
print(f"Activate 1: {activate1}")
print(f"Activate 2: {activate2}")
print(f"Score: {score}")
if score > bestScore:
bestResult['Node'] = node
bestResult['Activate1'] = activate1
bestResult['Activate2'] = activate2
bestResult['Score'] = score
bestScore = score
print(f"{bestResult}\n")
bestResult = {}
bestScore = -1
nodes = [10, 20, 50, 100]
activation = ['relu']
for activate in activation:
for node in nodes:
with tf.device("/cpu:0"):
model = Sequential()
model.add(Dense(node, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dense(1))
model = multi_gpu_model(model, gpus=2)
model.compile(loss='binary_crossentropy', metrics=['acc'], optimizer='adam')
history = model.fit(x, y, epochs=5, batch_size=1000, validation_split=.1)
plotMetric(history)
score = roc_auc_score(y_test,model.predict(x_test))
print(f"Nodes: {node}")
print(f"Activation: {activate}")
print(f"Score: {score}")
if score > bestScore:
bestResult['Node'] = node
bestResult['Activation'] = activate
bestResult['Score'] = score
bestScore = score
print(f"{bestResult}\n")
bestResult = {}
bestScore = -1
nodes = [150, 200, 250, 300]
activation = ['relu']
for activate in activation:
for node in nodes:
with tf.device("/cpu:0"):
model = Sequential()
model.add(Dense(node, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dense(1))
model = multi_gpu_model(model, gpus=2)
model.compile(loss='binary_crossentropy', metrics=['acc'], optimizer='adam')
history = model.fit(x, y, epochs=5, batch_size=1000, validation_split=.1)
plotMetric(history)
score = roc_auc_score(y_test,model.predict(x_test))
print(f"Nodes: {node}")
print(f"Activation: {activate}")
print(f"Score: {score}")
if score > bestScore:
bestResult['Node'] = node
bestResult['Activation'] = activate
bestResult['Score'] = score
bestScore = score
print(f"{bestResult}\n")
bestResult = {}
bestScore = -1
nodes = [10, 20, 50, 100, 150, 200, 250, 300]
activation = ['relu']
for activate in activation:
for node in nodes:
with tf.device("/cpu:0"):
model = Sequential()
model.add(Dense(node, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dense(1))
model = multi_gpu_model(model, gpus=2)
model.compile(loss='binary_crossentropy', metrics=['acc'], optimizer='adam')
history = model.fit(x, y, epochs=5, batch_size=1000, validation_split=.1)
plotMetric(history)
score = roc_auc_score(y_test,model.predict(x_test))
print(f"Nodes: {node}")
print(f"Activation: {activate}")
print(f"Score: {score}")
if score > bestScore:
bestResult['Node'] = node
bestResult['Activation'] = activate
bestResult['Score'] = score
bestScore = score
print(f"{bestResult}\n")
bestResult = {}
bestScore = -1
nodes = [50, 100, 150, 200, 250, 300]
activation = ['relu']
for activate in activation:
for node in nodes:
with tf.device("/cpu:0"):
model = Sequential()
model.add(Dense(node, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dense(1))
model = multi_gpu_model(model, gpus=2)
model.compile(loss='binary_crossentropy', metrics=['acc'], optimizer='adam')
history = model.fit(x, y, epochs=5, batch_size=1000, validation_split=.1)
plotMetric(history)
score = roc_auc_score(y_test,model.predict(x_test))
print(f"Nodes: {node}")
print(f"Activation: {activate}")
print(f"Score: {score}")
if score > bestScore:
bestResult['Node'] = node
bestResult['Activation'] = activate
bestResult['Score'] = score
bestScore = score
print(f"{bestResult}\n")
bestResult = {}
bestScore = -1
nodes = [150, 200, 250]
activate = 'relu'
optimization = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam']
for optimize in optimization:
for node in nodes:
with tf.device("/cpu:0"):
model = Sequential()
model.add(Dense(node, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dense(1))
model = multi_gpu_model(model, gpus=2)
model.compile(loss='binary_crossentropy', metrics=['acc'], optimizer=optimize)
history = model.fit(x, y, epochs=5, batch_size=1000, validation_split=.1)
plotMetric(history)
score = roc_auc_score(y_test,model.predict(x_test))
print(f"Nodes: {node}")
print(f"Activation: {activate}")
print(f"Score: {score}")
if score > bestScore:
bestResult['Node'] = node
bestResult['Activation'] = activate
bestResult['Optimization'] = optimize
bestResult['Score'] = score
bestScore = score
print(f"{bestResult}\n")
bestResult = {}
bestScore = -1
nodes = [250]
activate = 'relu'
optimize = 'Adadelta'
batchSize = [500, 1000, 10000, 100000]
for size in batchSize:
with tf.device("/cpu:0"):
model = Sequential()
model.add(Dense(node, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dense(1))
model = multi_gpu_model(model, gpus=2)
model.compile(loss='binary_crossentropy', metrics=['acc'], optimizer=optimize)
history = model.fit(x, y, epochs=5, batch_size=1000, validation_split=.1)
plotMetric(history)
score = roc_auc_score(y_test,model.predict(x_test))
print(f"Nodes: {node}")
print(f"Activation: {activate}")
print(f"Score: {score}")
if score > bestScore:
bestResult['Node'] = node
bestResult['Activation'] = activate
bestResult['Optimization'] = optimize
bestResult['Batch'] = size
bestResult['Score'] = score
bestScore = score
print(f"{bestResult}\n")
bestResult = {}
bestScore = -1
nodes = [250]
activate = 'relu'
optimize = 'Adadelta'
batchSize = [800, 900, 1000, 1100, 1200]
for size in batchSize:
with tf.device("/cpu:0"):
model = Sequential()
model.add(Dense(node, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dense(1))
model = multi_gpu_model(model, gpus=2)
model.compile(loss='binary_crossentropy', metrics=['acc'], optimizer=optimize)
history = model.fit(x, y, epochs=5, batch_size=1000, validation_split=.1)
plotMetric(history)
score = roc_auc_score(y_test,model.predict(x_test))
print(f"Nodes: {node}")
print(f"Activation: {activate}")
print(f"Score: {score}")
if score > bestScore:
bestResult['Node'] = node
bestResult['Activation'] = activate
bestResult['Optimization'] = optimize
bestResult['Batch'] = size
bestResult['Score'] = score
bestScore = score
print(f"{bestResult}\n")
bestResult = {}
bestScore = -1
nodes = [250]
activate = 'relu'
optimize = 'Adadelta'
batchSize = [600, 700, 750, 800, 850]
for size in batchSize:
with tf.device("/cpu:0"):
model = Sequential()
model.add(Dense(node, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dense(1))
model = multi_gpu_model(model, gpus=2)
model.compile(loss='binary_crossentropy', metrics=['acc'], optimizer=optimize)
history = model.fit(x, y, epochs=5, batch_size=1000, validation_split=.1)
plotMetric(history)
score = roc_auc_score(y_test,model.predict(x_test))
print(f"Nodes: {node}")
print(f"Activation: {activate}")
print(f"Score: {score}")
if score > bestScore:
bestResult['Node'] = node
bestResult['Activation'] = activate
bestResult['Optimization'] = optimize
bestResult['Batch'] = size
bestResult['Score'] = score
bestScore = score
print(f"{bestResult}\n")
bestResult = {}
bestScore = -1
nodes = [250]
activate = 'relu'
optimize = 'Adadelta'
batchSize = 800
initializers = ['random_normal', 'random_uniform', 'VarianceScaling', 'lecun_normal',
'lecun_uniform', 'glorot_normal', 'glorot_uniform']
for initiate in initializers:
with tf.device("/cpu:0"):
model = Sequential()
model.add(Dense(node, input_dim=x.shape[1], kernel_initializer=initiate))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dense(1))
model = multi_gpu_model(model, gpus=2)
model.compile(loss='binary_crossentropy', metrics=['acc'], optimizer=optimize)
history = model.fit(x, y, epochs=5, batch_size=1000, validation_split=.1)
plotMetric(history)
score = roc_auc_score(y_test,model.predict(x_test))
print(f"Nodes: {node}")
print(f"Activation: {activate}")
print(f"Score: {score}")
if score > bestScore:
bestResult['Node'] = node
bestResult['Activation'] = activate
bestResult['Optimization'] = optimize
bestResult['Batch'] = batchSize
bestResult['Initialization'] = initiate
bestResult['Score'] = score
bestScore = score
print(f"{bestResult}\n")
bestResult = {}
bestScore = -1
nodes = [250]
activate = 'relu'
optimize = 'Adadelta'
batchSize = 800
initiate = 'glorot_uniform'
with tf.device("/cpu:0"):
model = Sequential()
model.add(Dense(node, input_dim=x.shape[1], kernel_initializer=initiate))
model.add(Dense(node, activation=activate, kernel_initializer=initiate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate, kernel_initializer=initiate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate, kernel_initializer=initiate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate, kernel_initializer=initiate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate, kernel_initializer=initiate))
model.add(Dense(1))
model = multi_gpu_model(model, gpus=2)
model.compile(loss='binary_crossentropy', metrics=['acc'], optimizer=optimize)
history = model.fit(x, y, epochs=5, batch_size=1000, validation_split=.1)
plotMetric(history)
score = roc_auc_score(y_test,model.predict(x_test))
print(f"Nodes: {node}")
print(f"Activation: {activate}")
print(f"Score: {score}")
if score > bestScore:
bestResult['Node'] = node
bestResult['Activation'] = activate
bestResult['Optimization'] = optimize
bestResult['Batch'] = batchSize
bestResult['Initialization'] = initiate
bestResult['Score'] = score
bestScore = score
print(f"{bestResult}\n")
bestResult = {}
bestScore = -1
nodes = [250]
activate = 'relu'
optimize = 'Adadelta'
batchSize = 800
initiate = 'glorot_uniform'
epochs = [10, 15, 20]
for epoch in epochs:
with tf.device("/cpu:0"):
model = Sequential()
model.add(Dense(node, input_dim=x.shape[1], kernel_initializer=initiate))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dense(1))
model = multi_gpu_model(model, gpus=2)
model.compile(loss='binary_crossentropy', metrics=['acc'], optimizer=optimize)
history = model.fit(x, y, epochs=epoch, batch_size=1000, validation_split=.1)
plotMetric(history)
score = roc_auc_score(y_test,model.predict(x_test))
print(f"Nodes: {node}")
print(f"Activation: {activate}")
print(f"Score: {score}")
if score > bestScore:
bestResult['Node'] = node
bestResult['Activation'] = activate
bestResult['Optimization'] = optimize
bestResult['Batch'] = batchSize
bestResult['Initialization'] = initiate
bestResult['Epochs'] = epoch
bestResult['Score'] = score
bestScore = score
print(f"{bestResult}\n")
bestResult = {}
bestScore = -1
node = 250
activate = 'relu'
optimize = 'Adadelta'
batchSize = 800
initiate = 'glorot_uniform'
epoch = 20
with tf.device("/cpu:0"):
model = Sequential()
model.add(Dense(node, input_dim=x.shape[1], kernel_initializer=initiate))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dropout(0.10))
model.add(Dense(node, activation=activate))
model.add(Dense(1))
model = multi_gpu_model(model, gpus=2)
model.compile(loss='binary_crossentropy', metrics=['acc'], optimizer=optimize)
history = model.fit(x, y, epochs=epoch, batch_size=1000, validation_split=.1)
plotMetric(history)
score = roc_auc_score(y_test,model.predict(x_test))
print(f"Nodes: {node}")
print(f"Activation: {activate}")
print(f"Score: {score}")
if score > bestScore:
bestResult['Node'] = node
bestResult['Activation'] = activate
bestResult['Optimization'] = optimize
bestResult['Batch'] = batchSize
bestResult['Initialization'] = initiate
bestResult['Epochs'] = epoch
bestResult['Score'] = score
bestScore = score
print(f"{bestResult}\n")